13 research outputs found
ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models
Protein language models (pLMs), pre-trained via causal language modeling on
protein sequences, have been a promising tool for protein sequence design. In
real-world protein engineering, there are many cases where the amino acids in
the middle of a protein sequence are optimized while maintaining other
residues. Unfortunately, because of the left-to-right nature of pLMs, existing
pLMs modify suffix residues by prompting prefix residues, which are
insufficient for the infilling task that considers the whole surrounding
context. To find the more effective pLMs for protein engineering, we design a
new benchmark, Secondary structureE InFilling rEcoveRy, SEIFER, which
approximates infilling sequence design scenarios. With the evaluation of
existing models on the benchmark, we reveal the weakness of existing language
models and show that language models trained via fill-in-middle transformation,
called ProtFIM, are more appropriate for protein engineering. Also, we prove
that ProtFIM generates protein sequences with decent protein representations
through exhaustive experiments and visualizations.Comment: Preprin
Solvent: A Framework for Protein Folding
Consistency and reliability are crucial for conducting AI research. Many
famous research fields, such as object detection, have been compared and
validated with solid benchmark frameworks. After AlphaFold2, the protein
folding task has entered a new phase, and many methods are proposed based on
the component of AlphaFold2. The importance of a unified research framework in
protein folding contains implementations and benchmarks to consistently and
fairly compare various approaches. To achieve this, we present Solvent, an
protein folding framework that supports significant components of
state-of-the-art models in the manner of off-the-shelf interface Solvent
contains different models implemented in a unified codebase and supports
training and evaluation for defined models on the same dataset. We benchmark
well-known algorithms and their components and provide experiments that give
helpful insights into the protein structure modeling field. We hope that
Solvent will increase the reliability and consistency of proposed models and
gives efficiency in both speed and costs, resulting in acceleration on protein
folding modeling research. The code is available at
https://github.com/kakaobrain/solvent, and the project will continue to be
developed.Comment: preprint, 8page
A community-powered search of machine learning strategy space to find NMR property prediction models
The rise of machine learning (ML) has created an explosion in the potential
strategies for using data to make scientific predictions. For physical
scientists wishing to apply ML strategies to a particular domain, it can be
difficult to assess in advance what strategy to adopt within a vast space of
possibilities. Here we outline the results of an online community-powered
effort to swarm search the space of ML strategies and develop algorithms for
predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in
molecules. Using an open-source dataset, we worked with Kaggle to design and
host a 3-month competition which received 47,800 ML model predictions from
2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced
models with comparable accuracy to our best previously published "in-house"
efforts. A meta-ensemble model constructed as a linear combination of the top
predictions has a prediction accuracy which exceeds that of any individual
model, 7-19x better than our previous state-of-the-art. The results highlight
the potential of transformer architectures for predicting quantum mechanical
(QM) molecular properties
Deep learning models for predicting RNA degradation via dual crowdsourcing
Medicines based on messenger RNA (mRNA) hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition (‘Stanford OpenVaccine’) on Kaggle, involving single-nucleotide resolution measurements on 6,043 diverse 102–130-nucleotide RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504–1,588 nucleotides) with improved accuracy compared with previously published models. These results indicate that such models can represent in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for dataset creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales
Deep learning models for predicting RNA degradation via dual crowdsourcing
Messenger RNA-based medicines hold immense potential, as evidenced by their
rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA
molecules has been limited by their thermostability, which is fundamentally
limited by the intrinsic instability of RNA molecules to a chemical degradation
reaction called in-line hydrolysis. Predicting the degradation of an RNA
molecule is a key task in designing more stable RNA-based therapeutics. Here,
we describe a crowdsourced machine learning competition ("Stanford
OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on
6043 102-130-nucleotide diverse RNA constructs that were themselves solicited
through crowdsourcing on the RNA design platform Eterna. The entire experiment
was completed in less than 6 months, and 41% of nucleotide-level predictions
from the winning model were within experimental error of the ground truth
measurement. Furthermore, these models generalized to blindly predicting
orthogonal degradation data on much longer mRNA molecules (504-1588
nucleotides) with improved accuracy compared to previously published models.
Top teams integrated natural language processing architectures and data
augmentation techniques with predictions from previous dynamic programming
models for RNA secondary structure. These results indicate that such models are
capable of representing in-line hydrolysis with excellent accuracy, supporting
their use for designing stabilized messenger RNAs. The integration of two
crowdsourcing platforms, one for data set creation and another for machine
learning, may be fruitful for other urgent problems that demand scientific
discovery on rapid timescales
Ldlr 유전자가 제거된 마우스 모델에서 RELM-α 의한 당뇨성 동맥경화 감소효과
학위논문 (석사)-- 서울대학교 대학원 : 수의학과, 2017. 2. 이항.Resistin-like molecule (RELM)-α belongs to a family of secreted mammalian proteins that have putative immunomodulatory functions. Recent studies have identified a role of RELM-α in the pathogenesis of hyperlipidemia-induced atherosclerosis. However, whether RELM-α regulates diabetic atherosclerosis is unknown. Here we report that RELM-α has anti-atherogenic effects and protects against diabetic atherosclerosis in low-density lipoprotein receptor-deficient mice (LDLR -/-). Severity of the induced diabetic state was confirmed by monitoring of blood glucose levels and body weight. RELM-α overexpression appears to have a cholesterol-lowering effect. In particular, there was significant difference in cholesterol levels of diabetic group. After 8 weeks on a High-fat diet (HFD), total en face aortic lesion area was reduced in RELM-α overexpressing (RELM-α Tg) mice compared with control mice in both non-diabetic and diabetic group. Plaque area in the aortic arch was also decreased in RELM-α Tg of both groups. We show RELM-α overexpression has a higher anti-atherogenic effect with decrease of cholesterol in diabetic atherosclerosis compared with non-diabetic group. These findings define RELM-α as a novel therapeutic target for treating diabetic atherosclerosis.Introduction 1
Materials and Methods 8
1. Animal Studies and Diet 8
2. Genotyping 8
3. Antibodies 9
4. Immunoblotting 10
5. Streptozotocin Induced Diabetic Model and Mice Monitoring 11
6. Blood Analysis 12
7. Assessment of Atherosclerosis 12
8. Statistical Analysis 14
Results 15
1. The mice model of RELM-α overexpression 15
2. RELM-α overexpression reduces cholesterol in diabetic atherosclerosis mice 16
3. RELM-α overexpression reduces aortic arch plaque size 17
4. RELM-α overexpression decreases aortic root plaque size 18
List of Table 19
List of Figure 20
Discussion 30
References 38
Abstract in Korean 42Maste